AITopics | cluster quality

Collaborating Authors

cluster quality

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bipartite Stochastic Block Models with Tiny Clusters

Stefan Neumann

Neural Information Processing SystemsNov-20-2025, 19:11:13 GMT

We study the problem of finding clusters in random bipartite graphs.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Europe > Austria > Vienna (0.14)
North America > United States (0.14)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Identifying bias in cluster quality metrics

Renedo-Mirambell, Martí, Arratia, Argimiro

arXiv.org Artificial IntelligenceJul-8-2025

We study potential biases of popular cluster quality metrics, such as conductance or modularity. We propose a method that uses both stochastic and preferential attachment block models construction to generate networks with preset community structures, to which quality metrics will be applied. These models also allow us to generate multi-level structures of varying strength, which will show if metrics favour partitions into a larger or smaller number of clusters. Additionally, we propose another quality metric, the density ratio. We observed that most of the studied metrics tend to favour partitions into a smaller number of big clusters, even when their relative internal and external connectivity are the same. The metrics found to be less biased are modularity and density ratio.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.7717/peerj-cs.1523

2112.06287

Country: Europe > Spain (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

Add feedback

A Computational Approach to Improving Fairness in K-means Clustering

Zhou, Guancheng, Xu, Haiping, Xu, Hongkang, Li, Chenyu, Yan, Donghui

arXiv.org Artificial IntelligenceJun-3-2025

Clustering is an important problem in data mining. It aims to split the data into groups such that data points in the same group are similar while points in different groups are different under a given similarity metric. Clustering has been successfully applied in many practical applications, such as data grouping in exploratory data analysis, search results categorization, market segmentation etc. Clustering results are often used for further analysis or interpretation. However, directly applying results obtained from usual clustering algorithms may suffer from fairness issues-some cluster may favor data points from one of the subpopulations, i.e., having disproportionally more points. One example of 1 Figure 1: Illustration of the fairness issue in clustering, Points of different color indicate different traits on a sensitive variable, e.g., gender where blue indicates male and red female. Cluster 1 is dominated by females while Cluster 2 by males. Points with an arrow indicate that we might switch its cluster membership assignment to make the clusters less dominated by one subpopulation.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.22984

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.94)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.92)

Add feedback

Novel Topological Machine Learning Methodology for Stream-of-Quality Modeling in Smart Manufacturing

Lee, Jay, Ji, Dai-Yan, Hsu, Yuan-Ming

arXiv.org Artificial IntelligenceApr-23-2024

This paper presents a topological analytics approach within the 5-level Cyber-Physical Systems (CPS) architecture for the Stream-of-Quality assessment in smart manufacturing. The proposed methodology not only enables real-time quality monitoring and predictive analytics but also discovers the hidden relationships between quality features and process parameters across different manufacturing processes. A case study in additive manufacturing was used to demonstrate the feasibility of the proposed methodology to maintain high product quality and adapt to product quality variations. This paper demonstrates how topological graph visualization can be effectively used for the real-time identification of new representative data through the Stream-of-Quality assessment.

cluster quality, representative data, topological machine learning methodology, (15 more...)

arXiv.org Artificial Intelligence

2404.14728

Country: North America > United States > Ohio > Hamilton County > Cincinnati (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Architecture (0.89)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Enhancing Cluster Quality of Numerical Datasets with Domain Ontology

Heiyanthuduwage, Sudath Rohitha, Rahman, Md Anisur, Islam, Md Zahidul

arXiv.org Artificial IntelligenceApr-2-2023

Ontology-based clustering has gained attention in recent years due to the potential benefits of ontology. Current ontology-based clustering approaches have mainly been applied to reduce the dimensionality of attributes in text document clustering. Reduction in dimensionality of attributes using ontology helps to produce high quality clusters for a dataset. However, ontology-based approaches in clustering numerical datasets have not been gained enough attention. Moreover, some literature mentions that ontology-based clustering can produce either high quality or low-quality clusters from a dataset. Therefore, in this paper we present a clustering approach that is based on domain ontology to reduce the dimensionality of attributes in a numerical dataset using domain ontology and to produce high quality clusters. For every dataset, we produce three datasets using domain ontology. We then cluster these datasets using a genetic algorithm-based clustering technique called GenClust++. The clusters of each dataset are evaluated in terms of Sum of Squared-Error (SSE). We use six numerical datasets to evaluate the performance of our ontology-based approach. The experimental results of our approach indicate that cluster quality gradually improves from lower to the higher levels of a domain ontology.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2304.00653

Country:

Oceania > Australia (0.05)
Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.91)

Add feedback

Text Mining Through Label Induction Grouping Algorithm Based Method

Saleem, Gulshan, Ahmed, Nisar, Qamar, Usman

arXiv.org Artificial IntelligenceDec-15-2021

The main focus of information retrieval methods is to provide accurate and efficient results which are cost-effective too. LINGO (Label Induction Grouping Algorithm) is a clustering algorithm that aims to provide search results in form of quality clusters but also has a few limitations. In this paper, our focus is based on achieving results that are more meaningful and improving the overall performance of the algorithm. LINGO works on two main steps; Cluster Label Induction by using Latent Semantic Indexing technique (LSI) and Cluster content discovery by using the Vector Space Model (VSM). As LINGO uses VSM in cluster content discovery, our task is to replace VSM with LSI for cluster content discovery and to analyze the feasibility of using LSI with Okapi BM25. The next task is to compare the results of a modified method with the LINGO original method. The research is applied to five different text-based data sets to get more reliable results for every method. Research results show that LINGO produces 40-50% better results when using LSI for content Discovery. From theoretical evidence using Okapi BM25 for scoring method in LSI (LSI+Okapi BM25) for cluster content discovery instead of VSM, also results in better clusters generation in terms of scalability and performance when compares to both VSM and LSI's Results.

algorithm, lingo, lsi, (11 more...)

arXiv.org Artificial Intelligence

2112.08486

Country:

Asia > Pakistan > Punjab > Lahore Division > Lahore (0.06)
North America > United States > Hawaii (0.04)
Asia > Taiwan (0.04)
Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

Bipartite Stochastic Block Models with Tiny Clusters

Neumann, Stefan

Neural Information Processing SystemsDec-31-2018

We study the problem of finding clusters in random bipartite graphs. We present a simple two-step algorithm which provably finds even tiny clusters of size $O(n^\epsilon)$, where $n$ is the number of vertices in the graph and $\epsilon > 0$. Previous algorithms were only able to identify clusters of size $\Omega(\sqrt{n})$. We evaluate the algorithm on synthetic and on real-world data; the experiments show that the algorithm can find extremely small clusters even in presence of high destructive noise.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.28)
Europe > Austria > Vienna (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Bipartite Stochastic Block Models with Tiny Clusters

Neumann, Stefan

Neural Information Processing SystemsDec-31-2018

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.28)
Europe > Austria > Vienna (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Document Clustering Evaluation: Divergence from a Random Baseline

De Vries, Christopher M., Geva, Shlomo, Trotman, Andrew

arXiv.org Artificial IntelligenceAug-29-2012

Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation.

category, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1208.5654

Country: Oceania > Australia (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback